Single document Summarization based on Clustering Coefficient and Transitivity Analysis

نویسندگان

  • Yanting Li
  • Kai Cheng
چکیده

Document summarization is a technique aimed to automatically extract the main ideas from electronic documents. With the fast increase of electronic documents available on the network, techniques for making efficient use of such documents become increasingly important. In this paper, we propose a novel algorithm, called TriangleSum for single document summarization based on graph theory. The algorithm builds a dependency graph for the document based on syntactic dependency relation analysis. The nodes represent words or phrases of high frequency, and edges represent dependency relations between them. Then, a modified version of clustering coefficient is used to measure the strength of connection between nodes in a graph. By identifying triangles of nodes, a part of the dependency graph can be extracted. At last, a set of key sentences that represent the main document information can be extracted. Keyword:summarization, dependency graph, clustering coefficient, transitivity analysis

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Key Sentence Extraction from Single Document based on Triangle Analysis in Dependency Graph

Document summarization is a technique aimed to automatically extract main ideas from electronic documents. In this paper, we propose a novel algorithm, called TriangleSum for key sentence extraction from single document based on graph theory. The algorithm builds a dependency graph for the underlying document based on co-occurrence relation as well as syntactic dependency relations. The nodes r...

متن کامل

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Arabic text summarization based on latent semantic analysis to enhance arabic documents clustering

Arabic Documents Clustering is an important task for obtaining good results with the traditional Information Retrieval (IR) systems especially with the rapid growth of the number of online documents present in Arabic language. Documents clustering aim to automatically group similar documents in one cluster using different similarity/distance measures. This task is often affected by the document...

متن کامل

EXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS

Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...

متن کامل

Simultaneous Clustering and Noise Detection for Theme-based Summarization

Multi-document summarization aims to produce a concise summary that contains salient information from a set of source documents. Since documents often cover a number of topical themes with each theme represented by a cluster of highly related sentences, sentence clustering plays a pivotal role in theme-based summarization. Moreover, noting that realworld datasets always contain noises which ine...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011